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Highlights 

@ This work proposed a method to forecast the freshwater production of solar still by using 
conventional weather data. 

@ The dataset was obtained from 8 months of evaporation experiments and measurements. 

@ The forecasting model is established by machine learning and has a much higher accuracy than 
traditional models. 

© By applying the model, freshwater productions of four cities were predicted with high accuracy 


from their weather data. 


Abstract 

Solar stills are considered an effective method to solve the scarcity of drinkable water. However, it 
is still missing a way to forecast its production. Herein, it is proposed that a convenient forecasting 
model which just needs to input the conventional weather forecasting data. The model is established 
by using machine learning methods of random forest and optimized by Bayesian algorithm. The 
required data to train the model is obtained from daily measurements lasting 9 months. To validate 
the accuracy model, the determination coefficients of two types of solar stills are calculated as 
0.935 and 0.929, respectively, which are much higher than the value of both multiple linear 
regression (0.767) and the traditional models (0.829 and 0.847). Moreover, by appling the model, it 
is predicted that the freshwater production of four cities in China. The predicted production is 
approved to be reliable by a high value of correlation (0.868) between the predicted production and 
the solar insolation. With the help of the forecasting model, it would greatly promote the global 
application of solar stills. 
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Nomenclatures: 

RF Random forest 

BOA Bayesian optimization algorithm 

R? The determination coefficient of the model 
Ws Wind speed m/s 


Wp Wind direction 


Press. Atmospheric pressure Pa 

T Air temperature °C 

Tmax Maximum value of air temperature °C 

Tmin Minimum value of air temperature °C 

RH Relative humidity % 

AQI The air quality index 

BIF-SS The solar still with an interfacial evaporation structure at the bottom and insulation foams 
at the sidewall 


BSI-SS The solar still with an interfacial evaporation structure at both the bottom and the sidewall 


1. Introduction 


Seawater covers 70% of the earth, freshwater is mainly distributed in glaciers, ice caps, and 
underground!" °, With the increase in population, industrial activities, the shortage of drinkable 
water is a catastrophic issue the world facing" ^. As seawater accounts for 97% of the water on the 
earth, desalination is an effective solution for the shortage of freshwater"). 

Among the many desalination technologies, solar desalination’! is one of the most 
environmentally friendly technologies. Fortunately, areas where freshwater is scarce happen to 
possess abundant solar energy'”!. Solar still is one of the solar desalination technologies, which is 
easy to install and maintain"*!. Solar still has broad application prospects in remote coastal areas and 
islands. Given this, solar desalination has received widespread attention in recent years!'*!, 
However, the value of daily production fluctuates greatlyand much affected by climatic conditions, 
which is not easily forcasted. 

Tranditional models''®?"! show the function between production and a couple of important 
factors. Due to the complexity of heat and mass transfer in reality, these models with simple 
functions are difficult to describe the heat and mass transfer process inside the solar still accurately, 
and limited to guide the design of solar stills””!. Recently, it is an emerging and effective way to 
predict the performance of solar still by using machining learning method’. Such as the multiple 
linear regression (MLR) method", artificial neural network (ANN) method”* %1, random forest 


(RF) method?” **!, Among current algorithms, RF is an ensemble learning algorithm based on 


29, 30 [28] 


decision trees, with unexcelled accuracy” *"!, and shows excellent performance in predicting'”*’. 
However, the previous studies just gave the functional relationship between the performance 

and a couple of professional parameters, such as basin plate temperature, glass cover temperature, 

and feedwater temperature, etc., which is not convinent to measure for customers. More 


importantly, the previous models cannot forecast the production in advance, which is a big 


chanllenge. 


The production is greatly affected by weather. And, it is easy to obtain weather forecast data, 
such as air temperature, humidity, wind, atmospheric pressure, and air quality index. It will be a 
convenient and effective way to forcast the production if a model could be establish between the 
production and the weather forecasting data. 

The production forecasting is significant to promote the globle application of solar still. Even 
for remote areas, it is not difficult to get the conventional weather forcating. Besides, the forecasting 
can help to make a stable supply of water or a controllable desalination capacity. That is, with the 
help of forecasting, a proper substitute desalination strategy can be planned and chosen, such as 
using the electrically powered desalination as compensation. 

This work aims to make a model to forecast the daily production of solar still based on 
convenient weather data. The required data to train the model was obtained by carrying out 
experimental measurements from July 2020 to March 2021. Based on the production and weather 
data, the forecasting model was conducted by using the random forest method. To verify the 
practicability and accuracy of the model, the determination coefficients were calculated and 
compared. By applying the model, the freshwater production of four cities in China was forcasted 


from conventional weather data. 


2. Experimental systems 


The solar stills are consist of a glass cover, basin, foam heat-insulation layer, water feeding 
tank, freshwater outlet, and required measuring instrument, as shown in Fig. | (a). The bottom 
dimension is 50x50 cm. Singh and Tiwari®'! reported that the annual solar still yield reached a 
maximum value when the condensing glass cover inclination was equal to the latitude of the place. 
Thus the glass cover of the solar stills has an inclination angle of 30°, which is the preferred solar 
incidence angle at Hangzhou (120.2° E,30.3° N). The equipment is installed on the roof of a 
building in Hangzhou, China. The solar still is placed horizontally and the front is south facing. 


The schematic of solar still is shown in Fig. 1 (b). The solar still has an interfacial evaporation 
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structure at the bottom and insulation foams at the sidewall (BIF-SS). The BIF-SS adopts a three- 
layer composite structure: floating light absorption layer, water-conducting layer, and heat- 
insulating layer. The light-absorbing layer structure is made of black deerskin velvet fiber cloth, 
with 95% solar absorption. The water-conducting layer is made of cotton fiber cloth with a 
thickness of about 8 mm, and in contact with seawater through the water-conducting channel. The 
sides and bottom are all wrapped with heat-insulating extruded foam XPS board, 2 cm thick. The 
thermal conductivity of the XPS board is 0.03 W/m-K. The freshwater is obtained from the 
freshwater collection tank, recording by cylinder manually. The solar still with interfacial 
evaporation structure is designed based on our previous work”, which has both high energy 
efficiency and salt rejection capacity. Meanwhile, a control group was set up on the solar still with 
an interfacial evaporation structure at both the bottom and the sidewall (BSI-SS). The schematic of 


BSI-SS is shown in Fig. 1 (c). 


Water conducting layer 


eee a Heat-insulating layer 


solar stills with an interfacial evaporation structure: (b) at the bottom and the insulation foams at the sidewall 


(BFI), and (c) on both the bottom and sidewall (BSI). 


The measurements need a series of sensors which are shown in Table 1. The weather 
parameters were recorded every minute, including wind speed (Ws), wind direction (Wp), 
atmospheric pressure (Press.), air temperature (T), relative humidity (RH). The air quality index 
(AQI) data is obtained from the website of www.tianqi.com. The recorded weather data of 


Hangzhou is shown in Fig. 2 which is expressed as daily average values. Affected by the El Niño, 
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the average temperature in August is highest, which is significantly higher than that in July. 
Meanwhile, August is the driest month with the lowest average air humidity. It also can be seen 


from Fig. 2 that the AQI and atmospheric pressure are higher in the winter. 


Table 1 The test platform of meteorological data 


Name Device model Range Accuracy Resolution 
Wind speed sensor 011E-MetOne 0-60 m/s +0.1 m/s 0.04 m/s 
Wind direction sensor 020C-MetOne 0-360° +3° <0.1° 


Environmental humidity sensor HC2S3-Campbell 0-100% RH +0.8% RH 0.1% RH 


Atmospheric pressure sensor CS106-Campbell 500-1100 kPa +0.3 kPa +0.1 kPa 


Ambient temperature sensor 110PV-Campbell -40-135°C +0.2°C -- 
Data collector CR100-Campbell 0-4200 g 0.01 g -- 
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Figure 2 The recorded weather data of Hangzhou was used as input in the model predicting the production of 


solar still. (a) Air temperature. (b) Relative humidity. (c) The air quality index. (d) Atmospheric pressure. 
Fig. 3 (a) shows the hourly production of the BIF-SS on March 9th, the freshwater productivity 

gradually increases from 8:00 and reaches the highest at 12:00 about 0.8 kg/m?-h. By 20:00, the 

productivity is close to 0. Fig. 3 (b) shows the recorded water production of the solar still from July 


2020 to March 2021. Affected by the weather, the daily production varies. The freshwater 


production in August was the highest and significantly higher than in the other months. The highest 
daily production is 6.0 kg/m?-day. The data of weather and production listed in supporting materials 


(SM) I. 
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Figure 3 (a) Both the accumulated production (red dots) and the hourly production (black squares) of the BIF 
solar still on March 9*, 2021. (b) The daily production of BIF-SS measured from July 2020 to March 2021, 


which is a part of the dataset for building the forecasting model. 


3. Machine learning methods 


The forecasting model is established based on the dataset. The solar still dataset is given as 
F={X, y}1. where X is the input parameter, including Week, Ws, Wp, T, Press, RH, and AQI, and y 
is daily production, the target value corresponding to X. 

The basic steps include data preprocessing, model construction, and algorithm optimization. 
The process of data preprocessing refers to scaling the data attributes to a specific range. Because 
the data attributes with larger magnitudes will dominate, the accuracy of the model will be affected. 
The standardized method (Z-Scale) is used to scale the input data parameter. The Z-Scale method is 
based on the mean and standard deviation of the original data, the sample spacing can be 
maintained. After data standardization, the RF method is used to establish the forcasting model. 
First, selecting samples randomly, divided into training and test set. Then, building a decision tree 
for each piece of data, and get the predicting result. Last, vote on all the results and get the final 
result. The Bayesian optimization algorithm (BOA)!"*! is used for searching the most appropriate 


hyper-parameters of the RF model. The Diagram of the forecasting model establishment is shown in 
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Fig. 4 (Details in SM II). 


Experiments China meteorological 
data center 


Bayesian optimization algorithm 


Figure 4 The flowchart of making the forecasting model, which includes data preprocessing, model 


construction, algorithm optimization. 


4. Results and discussions 


Forcasting results of RF model 


Fig. 5 shows the performance of forcasting model for three different cases of testing dataset. 
The determination coefficient (R°) and mean square error (MSE) are used to evaluate the 
performance of the forcasting model (details in SM II). With the increasing/decreasing of the size of 
training/testing dataset, R? of the random forest models remains at a high level and improves 
gradually which indicates the model processes a good convergence. The value of R? and MSE are 


0.935 and 0.209, respectively, when the test size is 10%. 
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Figure 5 For BIF-SS, the predicted values versus the measured values corresponding to different sizes of 
testing dataset, which are (a) 10%, (b) 20% and (c) 30% of the dataset, respectively. The value of R? is much 
higher than that of multiple linear regression (0.767). 


The value of R? is much higher than that of multiple linear regression (0.767) and traditional 
models. For example, Kumar?” developed a thermal model to predict the exact performance of 
solar stills for a different range of Grashof Number, the value of R? of Kumar’s model was only 
0.829. In Panchal’s work", the main parameters of the theoretical model were water temperature 
and inner glass cover temperature, and the R? of the model was 0.847. The results in Fig. 5 indicate 
that the RF method possesses a much higher predicting accuracy than traditional models. (Details of 


calculation in SM III.) 


Correlation between productions and weather parameters 


It was evaluated that the degree of correlation between the production of solar stills and the 
conventional weather forecasting parameters. The random forest method was preferred due to its 
superior forecasting performance. And the results are shown in Fig. 6. The three highest parameters 
are the daily highest temperature (Tmax), relative humidity (RH), and the daily lowest temperature 
(Tmin) Whose values are 41%, 20%, and 18%, respectively. Moreover, Press., Ws, and Wp have 
similar importance values in the range of 2.3% to 3.6%, which is close to that of random orders 
(2.1%). The random orders were generated randomly, so it was a factor having no correlation with 
the production and used as a normal value for comparision. 

It indicates that Tmax, RH, and Tmin are the three highest correlated factors correalating with the 


production. Tmax has the higherst correlation values. When the temperature rises due to increasing 


solar radiation, the evaporation rate will be increased. The relative humidity has a higher degree of 
correlation because the relative humidity directly reflects weather conditions and solar radiation. 
When the air humidity is high, it is usually cloudy or rainy and has low radiation intensity. Besides 
the three highest correlated factors, the air quality index has an importance value of 6%. AQI can 
also affect solar radiation energy. When the AQI is high, it means the air quality is poor and the 


particulate matters scatter the sunlight, which reduces the solar energy entering solar stills. 
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Figure 6 The degree of correlation between the production of solar stills and the conventional weather 
forecasting parameters.. The three parameters with highest values are the daily highest temperature (T max), 
relative humidity (RH), and the daily lowest temperature (Tmin), whose values are 41%, 20%, and 18%, 


respectively. 
Forcasting results between different types of solar still 


A control group was set up to verify the accuracy and applicability of the predicting RF 
method. The solar evaporation experiments were done on the solar still with an interfacial 
evaporation structure at both the bottom and the sidewall (BSI-SS). 


Fig. 7 shows the results of the predicting performance based on the production data of BSI-SS. 


The predicting results are comparable to the BIF-SS. As shown in Fig. 8, 20% of production data is 
used as the test set. The forcasting models based on the two types of solar stills show high 
predicting accuracy, the R? of the BIF and BSI are 0.927 and 0.939. The results verify the high 


accuracy and applicability of the forcasting model. 
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Figure 7 The results of the predicting performance between different test sizes (BSI-SS). 
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Figure 8 Comparison of the production values between the measured and the predicted, using 20% of 


production data as the test set. (a) BIF-SS; (b) BSI-SS. 


5 Appliying forcasting model 


By applying the forcasting model, freshwater production of four Chinese cities (Wuhan, Hefei, 
Chongqing, and Linzhi) was calculated and predicted from the weather data. It is are obtained from 
the China meteorological data center (http://data.cma.cn) that the weather data from July 2020 to 


February 2021 including air temperature, atmospheric pressure, wind speed and direction, relative 
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humidity, air quality index. The four cities are picked up because they have similar latitudes to 
Hangzhou (~30 N). Then, the daily productions from July 2020 to February 2021 were calculated 
and predicted based on the daily weather data. 

The average daily productions of the four cities are shown in Fig. 9. The average daily 
productions in Hefei and Wuhan are similar to that of Hangzhou, 2.18 kg/m? per day. Because the 
three cities have similar latitudes and are located close to the Yangtze River, that is, the climates of 
these three cities are similar. The production of Chongqing is the lowest among these cities, 2.1 
kg/m’ per day, because Chongqing is foggy all year round and its intensity of solar radiation is 
lower than other cities. The production of Linzhi is the highest, 2.48 kg/m? per day. This is because 
Linzhi is located at the Qinghai-Tibet Plateau and has a high altitude (3.1 km) and insolation. The 


predicted daily production of the three cities were shown in SM IV. 
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Figure 9 The predicted average daily production of five cities in China by using the RF model. The 
production of Linzhi is the highest due to its high elevation and insolation. Chongqing is the lowest due to its 


dense mist and lower radiation 


Furthermore, The daily solar insolation data is obtained from the China meteorological data 


center to analyze the prediction accuracy. It needs a gauge to check the predicted values because 
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there are no measured values of production. As shown above, solar insolation is not used in building 
the model. That is, the values of solar insolation are independent of the predicted production. 
Generally, the solar insolation is in direct proportion to the production, which can be used as a 
gauge to check the predicted values. Fig. 10 shows the comparison of the predicted daily production 
and the solar insolation from July 2020 to February 2021 in Wuhan. Because of the higher/lower 
radiation intensity and temperature, the production should be higher in the summer/winter. The 
changing trend of the predicted production is similar to the daily solar insolation. And the 
correlation coefficient of the two data sets is 0.868, which indicates that the forcasting model 


possesses high accuracy. 
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Figure 10 A comparison of the predicted daily production and the solar insolation from July 2020 to February 
2021 in Wuhan. The correlation coefficient of the predicted daily production and the solar insolation is 


0.868, which indicates that the forcasting model possesses high accuracy. 


6. Conclusions 


In a conclusion, a forecasting model is built, that can forecast freshwater production by 
convenient weather data. To collect the dataset, a series of solar evaporation experiments were done 
from July 2020 to March 2021 based on two types of solar stills, where the values of production and 


weather data were recorded. Then, the model to forecast solar still production was established by 


using the random forest method and conducted by the Bayesian optimization algorithm. 

The forecasting model has a high accuracy. The determination coefficient (R°) on the training 
dataset and test dataset can reach 0.946 and 0.935, respectively, the test size is 10% of production 
data. A control group was set up to verify the accuracy and applicability of the predicting RF 
method, the determination coefficients of two types of solar stills are calculated as 0.935 and 0.929. 

To look for closely related parameters, it was also calculated that the degree of correlation 
between the production and weather parameters. The three highest correlated parameters are 
maximum air temperature, Relative humidity, minimum air temperature, whose degree of 
correlation are 41%, 20%, 18%, respectively. 

By applying the model, productions of four cities were predicted with high accuracy from their 
weather data. To verify the reliability of the predicted results, the predicting results were compared 
with the daily solar insolation data. The correlation coefficient between predicted production and 
the solar insolation is 0.864, indicating that the predictions have high accuracy. 

With the help of the forecasting model, it would greatly promote the global application of solar 


stills. 
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